282 research outputs found
A Compact Index for Order-Preserving Pattern Matching
Order-preserving pattern matching was introduced recently but it has already
attracted much attention. Given a reference sequence and a pattern, we want to
locate all substrings of the reference sequence whose elements have the same
relative order as the pattern elements. For this problem we consider the
offline version in which we build an index for the reference sequence so that
subsequent searches can be completed very efficiently. We propose a
space-efficient index that works well in practice despite its lack of good
worst-case time bounds. Our solution is based on the new approach of
decomposing the indexed sequence into an order component, containing ordering
information, and a delta component, containing information on the absolute
values. Experiments show that this approach is viable, faster than the
available alternatives, and it is the first one offering simultaneously small
space usage and fast retrieval.Comment: 16 pages. A preliminary version appeared in the Proc. IEEE Data
Compression Conference, DCC 2017, Snowbird, UT, USA, 201
Compressed Spaced Suffix Arrays
Spaced seeds are important tools for similarity search in bioinformatics, and
using several seeds together often significantly improves their performance.
With existing approaches, however, for each seed we keep a separate linear-size
data structure, either a hash table or a spaced suffix array (SSA). In this
paper we show how to compress SSAs relative to normal suffix arrays (SAs) and
still support fast random access to them. We first prove a theoretical upper
bound on the space needed to store an SSA when we already have the SA. We then
present experiments indicating that our approach works even better in practice
On the ordering of sparse linear systems
AbstractIn this paper we consider the algorithms for transforming an n × n sparse matrix A into another matrix B such that Gaussian elimination applied to B takes time asymptotically less than n3. These algorithms take the sparse matrix A as input, and return a pair of permutation matrices P, Q such that B = PAQ has a small bandwidth, or some other desirable form. We study the average effectiveness of these algorithms by using random matrices with Θ(n) nonzero elements. We prove that with high probability these algorithms cannot produce a reduction of the asymptotic cost of the standard Gaussian elimination algorithm.We also study the effectiveness of these algorithms for ordering very sparse matrices. We show that there exist matrices with 3n nonzeros for which reordering rows and columns does not reduce the asymptotic cost of Gaussian elimination. We also prove that each matrix with at most two nonzeros in each row and in each column, can be transformed into a banded matrix with bandwidth five
XBWT Tricks
The eXtended Burrows-Wheeler Transform (XBWT) is a
data transformation introduced in [Ferragina et al., FOCS 2005] to com-
pactly represent a labeled tree and simultaneously support navigation
and path-search operations over its label structure.
A natural application of the XBWT is to store a dictionary of strings.
A recent extensive experimental study [Martı́nez-Prieto et al., Informa-
tion Systems, 2016] shows that, among the available string dictionary
implementations, the XBWT is attractive because of its good tradeoff
between small space usage, speed, and support for substring searches.
In this paper we further investigate the use of the XBWT for storing a
string dictionary. Our first contribution is to show how to add suffix links
(aka failure links) to a XBWT string dictionary. For a XBWT dictionary
with n internal nodes our suffix links can be traversed in constant time
and only take 2n + o(n) bits of space.
Our second contribution are practical construction algorithms for the
XBWT, including the additional data structure supporting the traver-
sal of suffix links. Our algorithms build on the many well engineered
algorithms for Suffix Array and BWT construction and offer different
tradeoffs between running time and working space
On Computing the Entropy of Cellular Automata
We study the topological entropy of a particular class of dynamical systems: cellular automata. The topological entropy of a dynamical system (X,F) is a measure of the complexity of the dynamics of F over the space X. The problem of computing (or even approximating) the topological entropy of a given cellular automata is algorithmically undecidable (Ergodic Theory Dynamical Systems 12 (1992) 255). In this paper, we show how to compute the entropy of two important classes of cellular automata namely, linear and positively expansive cellular automata. In particular, we prove a closed formula for the topological entropy of D-dimensional (D?1) linear cellular automata over the ring and we provide an algorithm for computing the topological entropy of positively expansive cellular automata
Multiple seeds sensitivity using a single seed with threshold
Spaced seeds are a fundamental tool for similarity search in biosequences. The best sensitivity/selectivity trade-offs are obtained using many seeds simultaneously: This is known as the multiple seed approach. Unfortunately, spaced seeds use a large amount of memory and the available RAM is a practical limit to the number of seeds one can use simultaneously. Inspired by some recent results on lossless seeds, we revisit the approach of using a single spaced seed and considering two regions homologous if the seed hits in at least t sufficiently close positions. We show that by choosing the locations of the don't care symbols in the seed using quadratic residues modulo a prime number, we derive single seeds that when used with a threshold t > 1 have competitive sensitivity/selectivity trade-offs, indeed close to the best multiple seeds known in the literature. In addition, the choice of the threshold t can be adjusted to modify sensitivity and selectivity a posteriori, thus enabling a more accurate search in the specific instance at issue. The seeds we propose also exhibit robustness and allow flexibility in usage
- …